Truth Discovery Algorithms: An Experimental Evaluation

نویسندگان

  • Dalia Attia Waguih
  • Laure Berti-Équille
چکیده

A fundamental problem in data fusion is to determine the veracity of multi-source data in order to resolve conflicts. While previous work in truth discovery has proved to be useful in practice for specific settings, sources’ behavior or data set characteristics, there has been limited systematic comparison of the competing methods in terms of efficiency, usability, and repeatability. We remedy this deficit by providing a comprehensive review of 12 state-of-the art algorithms for truth discovery. We provide reference implementations and an in-depth evaluation of the methods based on extensive experiments on synthetic and real-world data. We analyze aspects of the problem that have not been explicitly studied before, such as the impact of initialization and parameter setting, convergence, and scalability. We provide an experimental framework for extensively comparing the methods in a wide range of truth discovery scenarios where source coverage, numbers and distributions of conflicts, and true positive claims can be controlled and used to evaluate the quality and performance of the algorithms. Finally, we report comprehensive findings obtained from the experiments and provide new insights for future research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring Relevance as Truth Criterion on the Web and Classifying Claims in Belief Levels

The Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the Web. Moreover, different websites often provide conflicting information on a subject. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this paper...

متن کامل

Pay-as-you-go Feedback in Data Quality Systems

In many domains such as the web, sensor networks and social media, sources often provide conflicting information. It is of utmost importance to resolve conflicts and identify correct information. A number of approaches, referred to as truth finders, have been proposed recently. They address the problem of truth discovery using different principles such as link analysis, Bayesian modeling and re...

متن کامل

Scalable Uncertainty-Aware Truth Discovery in Big Data Social Sensing Applications for Cyber-Physical Systems

Social sensing is a new big data application paradigm for Cyber-Physical Systems (CPS), where a group of individuals volunteer (or are recruited) to report measurements or observations about the physical world at scale. A fundamental challenge in social sensing applications lies in discovering the correctness of reported observations and reliability of data sources without prior knowledge on ei...

متن کامل

In Search of the Consensus Among Musical Pattern Discovery Algorithms

Patterns are an essential part of music and there are many different algorithms that aim to discover them. Based on the improvements brought by using data fusion methods to find the consensus of algorithms on other MIR tasks, we hypothesize that fusing the output from musical pattern discovery algorithms will improve the pattern discovery results. In this paper, we explore two methods to combin...

متن کامل

Using Causal Discovery to Track Information Flow in Spatio-Temporal Data - A Testbed and Experimental Results Using Advection-Diffusion Simulations

Causal discovery algorithms based on probabilistic graphical models have emerged in geoscience applications for the identification and visualization of dynamical processes. The key idea is to learn the structure of a graphical model from observed spatio-temporal data, which indicates information flow, thus pathways of interactions, in the observed physical system. Studying those pathways allows...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1409.6428  شماره 

صفحات  -

تاریخ انتشار 2014